智能论文笔记

3D Harmonic Loss: Towards Task-consistent and Time-friendly 3D Object Detection on Edge for Intelligent Transportation System

Haolin Zhang , M S Mekala , Zulkar Nain , Ju H. Park , Ho-Youl Jung

分类：计算机视觉

2022-11-07

Edge computing-based 3D perception has received attention in intelligent transportation systems (ITS) because real-time monitoring of traffic candidates potentially strengthens Vehicle-to-Everything (V2X) orchestration. Thanks to the capability of precisely measuring the depth information on surroundings from LiDAR, the increasing studies focus on lidar-based 3D detection, which significantly promotes the development of 3D perception. Few methods met the real-time requirement of edge deployment because of high computation-intensive operations. Moreover, an inconsistency problem of object detection remains uncovered in the pointcloud domain due to large sparsity. This paper thoroughly analyses this problem, comprehensively roused by recent works on determining inconsistency problems in the image specialisation. Therefore, we proposed a 3D harmonic loss function to relieve the pointcloud based inconsistent predictions. Moreover, the feasibility of 3D harmonic loss is demonstrated from a mathematical optimization perspective. The KITTI dataset and DAIR-V2X-I dataset are used for simulations, and our proposed method considerably improves the performance than benchmark models. Further, the simulative deployment on an edge device (Jetson Xavier TX) validates our proposed model's efficiency. Our code is open-source and publicly available.

translated by 谷歌翻译

Learning Diverse Tone Styles for Image Retouching

Haolin Wang , Jiawei Zhang , Ming Liu , Xiaohe Wu , Wangmeng Zuo

分类：计算机视觉

2022-07-12

图像修饰，旨在再生给定图像的视觉令人愉悦的演绎，是用户具有不同美学感觉的主观任务。大多数现有的方法都部署了确定性模型，以从特定的专家那里学习修饰样式，从而使其不太灵活地满足各种主观偏好。此外，由于对不同图像的有针对性处理，专家的内在多样性也被缺乏描述。为了避免此类问题，我们建议通过基于流动的架构来学习各种图像修饰。与直接生成输出图像的当前基于流的方法不同，我们认为在样式域中学习可以（i）将修饰样式从图像内容中解开，（ii）导致稳定的样式表现形式，并且（iii）避免空间不和谐效果。为了获得有意义的图像音调样式表示，设计了联合培训管道，设计由样式编码器，条件修饰网和图像音调样式正常化流量（TSFLOW）模块组成。特别是，样式编码器预测了输入图像的目标样式表示，该图像是用于修饰的修饰网中的条件信息，而TSFlow将样式表示向量映射到前向通行中的高斯分布。训练后，TSFlow可以通过从高斯分布中取样来生成多样的图像音调矢量。关于MIT-Adobe Fivk和PPR10K数据集的广泛实验表明，我们提出的方法对最新方法有利，并且有效地产生了不同的结果以满足不同的人类美学偏好。源代码和预培训模型可在https://github.com/ssrheart/tsflow上公开获得。

translated by 谷歌翻译

Federated Bayesian Neural Regression: A Scalable Global Federated Gaussian Process

Haolin Yu , Kaiyang Guo , Mahdi Karami , Xi Chen , Guojun Zhang , Pascal Poupart

分类：机器学习

2022-06-13

在适用联合学习框架（FL）框架的典型情况下，客户常见的是没有足够的培训数据来产生准确的模型。因此，不仅提供点估计的模型，而且提供一些信心概念是有益的。高斯工艺（GP）是一种强大的贝叶斯模型，随着自然校准的差异估计。但是，学习独立的全球GP是一项挑战，因为合并本地内核会导致隐私泄漏。为了保护隐私，以前考虑联合GPS的先前作品避免通过专注于个性化设置或学习本地模型的合奏来避免学习全球模型。我们提出了联邦贝叶斯神经回归（FEDBNR），这是一种算法，该算法学习了可扩展的独立全球联合GP，尊重客户的隐私。我们通过定义统一的随机内核来结合深内核学习和随机特征，以进行可伸缩。我们显示这种随机的内核可以恢复任何固定的内核和许多非平稳核。然后，我们得出了一种学习全局预测模型的原则方法，就像所有客户数据都集中一样。我们还学习针对非相同和独立分布（非I.I.D。）客户的知识蒸馏方法的全球核。与其他联合GP模型相比，在现实世界回归数据集上进行了实验，并显示出统计学上显着的改进。

translated by 谷歌翻译

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

Hanqing Zhang , Haolin Song , Shaoyu Li , Ming Zhou , Dawei Song

分类：自然语言处理

2022-01-14

Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). It is regarded as crucial for the development of advanced text generation technologies that are more natural and better meet the specific constraints in practical applications. In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse and fluent text. However, due to the lower level of interpretability of deep neural networks, the controllability of these methods need to be guaranteed. To this end, controllable text generation using transformer-based PLMs has become a rapidly growing yet challenging new research hotspot. A diverse range of approaches have emerged in the recent 3-4 years, targeting different CTG tasks which may require different types of controlled constraints. In this paper, we present a systematic critical review on the common tasks, main approaches and evaluation methods in this area. Finally, we discuss the challenges that the field is facing, and put forward various promising future directions. To the best of our knowledge, this is the first survey paper to summarize CTG techniques from the perspective of PLMs. We hope it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.

translated by 谷歌翻译

Invertible Network for Unpaired Low-light Image Enhancement

Jize Zhang , Haolin Wang , Xiaohe Wu , Wangmeng Zuo

分类：计算机视觉

2021-12-24

现有的未配对的低光图像增强方法更喜欢采用双向GAN框架，其中部署了两个CNN发生器以分别进行增强和降级。然而，这种数据驱动的模型忽略了低和正常光图像之间的变换的固有特性，导致不稳定的训练和伪像。在这里，我们建议利用可逆网络来增强前进过程中的低光图像，并与未配对的学习相反地降低正常光。然后将产生的和实际图像送入对抗性学习的鉴别器中。除了对抗性损失外，我们还设计各种损失功能，以确保培训的稳定性并保持更多图像细节。特别是，引入了可逆性损失以减轻过度暴露问题。此外，我们为低光图像提供了一种逐步的自我指导增强过程，对SOTA实现了良好的性能。

translated by 谷歌翻译

End-to-end Emotion-Cause Pair Extraction via Learning to Link

Haolin Song , Chen Zhang , Qiuchi Li , Dawei Song

分类：自然语言处理

2020-02-25

Emotion-cause pair extraction (ECPE), as an emergent natural language processing task, aims at jointly investigating emotions and their underlying causes in documents. It extends the previous emotion cause extraction (ECE) task, yet without requiring a set of pre-given emotion clauses as in ECE. Existing approaches to ECPE generally adopt a two-stage method, i.e., (1) emotion and cause detection, and then (2) pairing the detected emotions and causes. Such pipeline method, while intuitive, suffers from two critical issues, including error propagation across stages that may hinder the effectiveness, and high computational cost that would limit the practical application of the method. To tackle these issues, we propose a multi-task learning model that can extract emotions, causes and emotion-cause pairs simultaneously in an end-to-end manner. Specifically, our model regards pair extraction as a link prediction task, and learns to link from emotion clauses to cause clauses, i.e., the links are directional. Emotion extraction and cause extraction are incorporated into the model as auxiliary tasks, which further boost the pair extraction. Experiments are conducted on an ECPE benchmarking dataset. The results show that our proposed model outperforms a range of state-of-the-art approaches.

translated by 谷歌翻译

Risk-Averse MDPs under Reward Ambiguity

Haolin Ruan , Zhi Chen , Chin Pang Ho

分类：机器学习

2023-01-03

We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.

translated by 谷歌翻译

Boosting Semi-Supervised Learning with Contrastive Complementary Labeling

Qinyi Deng , Yong Guo , Zhibang Yang , Haolin Pan , Jian Chen

分类：计算机视觉

2022-12-13

Semi-supervised learning (SSL) has achieved great success in leveraging a large amount of unlabeled data to learn a promising classifier. A popular approach is pseudo-labeling that generates pseudo labels only for those unlabeled data with high-confidence predictions. As for the low-confidence ones, existing methods often simply discard them because these unreliable pseudo labels may mislead the model. Nevertheless, we highlight that these data with low-confidence pseudo labels can be still beneficial to the training process. Specifically, although the class with the highest probability in the prediction is unreliable, we can assume that this sample is very unlikely to belong to the classes with the lowest probabilities. In this way, these data can be also very informative if we can effectively exploit these complementary labels, i.e., the classes that a sample does not belong to. Inspired by this, we propose a novel Contrastive Complementary Labeling (CCL) method that constructs a large number of reliable negative pairs based on the complementary labels and adopts contrastive learning to make use of all the unlabeled data. Extensive experiments demonstrate that CCL significantly improves the performance on top of existing methods. More critically, our CCL is particularly effective under the label-scarce settings. For example, we yield an improvement of 2.43% over FixMatch on CIFAR-10 only with 40 labeled data.

translated by 谷歌翻译

SDFE-LV: A Large-Scale, Multi-Source, and Unconstrained Database for Spotting Dynamic Facial Expressions in Long Videos

Xiaolin Xu , Yuan Zong , Wenming Zheng , Yang Li , Chuangao Tang , Xingxun Jiang , Haolin Jiang

分类：计算机视觉

2022-09-18

在本文中，我们提出了一个称为SDFE-LV的大规模，多源和不受约束的数据库，用于发现长视频中完整动态面部表达的发作和偏移帧，这被称为动态面部表情斑点的主题（DFE）和许多面部表达分析任务的重要步骤。具体而言，SDFE-LV由1,191个长视频组成，每个视频包含一个或多个完整的动态面部表情。此外，在相应的长视频中，每个完整的动态面部表达都被10次训练有素的注释者独立标记了五次。据我们所知，SDFE-LV是DFES任务的第一个无限制的大规模数据库，其长期视频是从多个现实世界/密切现实世界中的媒体来源收集的，例如电视采访，纪录片，电影和电影，以及我们媒体短视频。因此，在实践中，SDFE-LV数据库上的DFE任务将遇到许多困难，例如头部姿势变化，遮挡和照明。我们还通过使用许多最新的深度发现方法，从不同角度提供了全面的基准评估，因此对DFE感兴趣的研究人员可以快速而轻松地开始。最后，通过有关实验评估结果的深入讨论，我们试图指出几个有意义的方向来处理DFES任务，并希望将来DFE可以更好地进步。此外，SDFE-LV将仅尽快自由发布供学术使用。

translated by 谷歌翻译

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

SiCheng Yang , Methawee Tantrawenith , Haolin Zhuang , Zhiyong Wu , Aolan Sun , Jianzong Wang , ning cheng , Huaizhen Tang , Xintao Zhao , Jie Wang

分类：机器学习

2022-08-18

只有单个目标扬声器的语音供参考的单发语音转换（VC）已成为一个热门研究主题。现有作品通常会散布音色，而有关音高，节奏和内容的信息仍然混合在一起。为了进一步删除这些语音组件，有效地执行一声VC，我们采用随机重新采样用于音高和内容编码器，并使用互信息的各种对比对数比率上限和基于梯度反向层的对抗性相互信息学习来确保不同部分在训练过程中仅包含所需的分离表示的潜在空间。 VCTK数据集的实验显示该模型就自然性和智能性方面实现了一声VC的最新性能。此外，我们可以通过语音表示分离分别传递音色，音调和节奏的单发VC的特征。我们的代码，预训练的模型和演示可在https://im1eon.github.io/is2022-Srdvc/上获得。

translated by 谷歌翻译